{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 数据类型 - DataFrame\n",
    "\n",
    "\n",
    "欢迎大家回到 **Machine Learning Diary**~ 👏\n",
    "\n",
    "今天这节课主要介绍一种非常有用的数据类型，叫做\"DataFrame\"，经常缩写为\"df\"。本质上是一个表格型的数据结构，类似excel的sheet，是在数据挖掘中最常用的一个工具。\n",
    "\n",
    "说到DataFrame，就不得不提一个非常强大的包 - **pandas**。因此这节课都会用到这个包，顺便感受下如何使用包。\n",
    "\n",
    "打开Jupyter，尝试来生成一个dataframe："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "   aaa  bbb  ccc\n",
      "0    1    4    7\n",
      "1    2    5    8\n",
      "2    3    6    9\n"
     ]
    }
   ],
   "source": [
    "import pandas as pd\n",
    "\n",
    "## 创建DataFrame ##\n",
    "df = pd.DataFrame({'aaa':[1,2,3], 'bbb':[4,5,6], 'ccc':[7,8,9]})\n",
    "print(df)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "根据上面的结果，我们可以看出刚刚创建的 df表格 一共有3行3列。\"aaa/bbb/ccc\"是列名。最左边一列是index索引，表示顺序，从零开始计算。0表示第一行，2就表示第三行。\n",
    "\n",
    "### 取列\n",
    "下面尝试抽取表格里的列（column），经常缩写为 col 。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "取aaa列\n",
      "\n",
      "0    1\n",
      "1    2\n",
      "2    3\n",
      "Name: aaa, dtype: int64\n",
      "-------------------\n",
      "取aaa列\n",
      "\n",
      "0    1\n",
      "1    2\n",
      "2    3\n",
      "Name: aaa, dtype: int64\n"
     ]
    }
   ],
   "source": [
    "print('取aaa列\\n')\n",
    "col = df['aaa']  # 注意在列名上加引号\n",
    "print(col)\n",
    "print('-------------------')\n",
    "print('取aaa列\\n')\n",
    "col = df.aaa\n",
    "print(col)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "col就是aaa这一列的值。上面有两列，左边的一列仍旧是index索引。下面出现 dtype: int64，表示这列的数据格式都是int整数。上面是两种取列的方式，我个人还是比较喜欢第一种取列的方式。\n",
    "\n",
    "\n",
    "### 取行\n",
    "\n",
    "取列的时候因为有列名，但是每行不能都拥有一个名字，所以用索引顺序来取行。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "取第一行\n",
      "   aaa  bbb  ccc\n",
      "0    1    4    7\n",
      "-------------------\n",
      "取第一行\n",
      "   aaa  bbb  ccc\n",
      "0    1    4    7\n",
      "-------------------\n",
      "取前两行\n",
      "   aaa  bbb  ccc\n",
      "0    1    4    7\n",
      "1    2    5    8\n",
      "-------------------\n",
      "取所有的行\n",
      "   aaa  bbb  ccc\n",
      "0    1    4    7\n",
      "1    2    5    8\n",
      "2    3    6    9\n",
      "-------------------\n",
      "取倒数第一行\n",
      "   aaa  bbb  ccc\n",
      "2    3    6    9\n"
     ]
    }
   ],
   "source": [
    "print('取第一行')\n",
    "line = df[:1]\n",
    "print(line)\n",
    "print('-------------------')\n",
    "\n",
    "print('取第一行')\n",
    "line = df[0:1]\n",
    "print(line)\n",
    "print('-------------------')\n",
    "\n",
    "print('取前两行')\n",
    "line = df[:2]\n",
    "print(line)\n",
    "print('-------------------')\n",
    "\n",
    "print('取所有的行')\n",
    "line = df[:]\n",
    "print(line)\n",
    "\n",
    "print('-------------------')\n",
    "print('取倒数第一行')\n",
    "line = df[-1:]\n",
    "print(line)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 取对象"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "把bbb列的第二个找出来\n",
      "bbb列的第二个为： 5 \n",
      "\n",
      "进行数值替换：\n",
      "替换后，数值为： 100\n"
     ]
    }
   ],
   "source": [
    "print('把bbb列的第二个找出来')\n",
    "obj = df['bbb'][1]\n",
    "print('bbb列的第二个为：',obj ,'\\n')\n",
    "\n",
    "print('进行数值替换：')\n",
    "df['bbb'][1] = 100\n",
    "obj = df['bbb'][1]\n",
    "print('替换后，数值为：',obj)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 行连接（纵向：变长）- concat"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "  key  value\n",
      "0   a      1\n",
      "1   b      2\n",
      "2   c      3\n",
      "---------------------\n",
      "  key  value\n",
      "0   d      4\n",
      "1   e      5\n",
      "2   f      6\n",
      "---------------------\n",
      "  key  value\n",
      "0   a      1\n",
      "1   b      2\n",
      "2   c      3\n",
      "0   d      4\n",
      "1   e      5\n",
      "2   f      6\n"
     ]
    }
   ],
   "source": [
    "df1=pd.DataFrame({'key':['a','b','c'],'value':[1,2,3]})  \n",
    "print(df1)\n",
    "print('---------------------')\n",
    "df2=pd.DataFrame({'key':['d','e','f'],'value':[4,5,6]}) \n",
    "print(df2)\n",
    "print('---------------------')\n",
    "df3=pd.concat([df1,df2])\n",
    "print(df3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 列连接（横向：变宽）- merge"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "  key  value1\n",
      "0   a       1\n",
      "1   b       2\n",
      "2   c       3\n",
      "---------------------\n",
      "  key  value2\n",
      "0   a       4\n",
      "1   b       5\n",
      "2   c       6\n",
      "---------------------\n",
      "  key  value1  value2\n",
      "0   a       1       4\n",
      "1   b       2       5\n",
      "2   c       3       6\n"
     ]
    }
   ],
   "source": [
    "df1=pd.DataFrame({'key':['a','b','c'],'value1':[1,2,3]})\n",
    "print(df1)\n",
    "print('---------------------')\n",
    "df2=pd.DataFrame({'key':['a','b','c'],'value2':[4,5,6]}) \n",
    "print(df2)\n",
    "print('---------------------')\n",
    "df3=pd.merge(df1,df2)\n",
    "print(df3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "关于DataFrame的操作可能一天一夜都说不完，之后有空再对这节课进行补充。在以后的课程中基本是与 df 天天见面的频率，会继续探索数字之间的奇妙组合。\n",
    "\n",
    "👩好啦，今天的课程就到这里啦！咱们下次见！~"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
